FitNets: Hints for Thin Deep Nets

نویسندگان

  • Adriana Romero
  • Nicolas Ballas
  • Samira Ebrahimi Kahou
  • Antoine Chassang
  • Carlo Gatta
  • Yoshua Bengio
چکیده

While depth tends to improve network performances, it also makes gradient-based training more difficult since deeper networks tend to be more non-linear. The recently proposed knowledge distillation approach is aimed at obtaining small and fast-to-execute models, and it has shown that a student network could imitate the soft output of a larger teacher network or ensemble of networks. In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student. Because the student intermediate hidden layer will generally be smaller than the teacher’s intermediate hidden layer, additional parameters are introduced to map the student hidden layer to the prediction of the teacher hidden layer. This allows one to train deeper students that can generalize better or run faster, a trade-off that is controlled by the chosen student capacity. For example, on CIFAR-10, a deep student network with almost 10.4 times less parameters outperforms a larger, state-of-the-art teacher network.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

All you need is a good init

Layer-sequential unit-variance (LSUV) initialization – a simple method for weightinitialization for deep net learning – is proposed. The method consists of the twosteps. First, pre-initialize weights of each convolution or inner-product layer withorthonormal matrices. Second, proceed from the first to the final layer, normaliz-ing the variance of the output of each layer to be e...

متن کامل

What is the Problem with Proof Nets for Classical Logic ? Lutz

This paper is an informal (and nonexhaustive) overview over some existing notions of proof nets for classical logic, and gives some hints why they might be considered to be unsatisfactory.

متن کامل

Generalization and Expressivity for Deep Nets

Along with the rapid development of deep learning in practice, theoretical explanations for its success become urgent. Generalization and expressivity are two widely used measurements to quantify theoretical behaviors of deep learning. The expressivity focuses on finding functions expressible by deep nets but cannot be approximated by shallow nets with the similar number of neurons. It usually ...

متن کامل

Some Preliminary Hints on Formalizing UML with Object Petri Nets

Petri nets have already been used to formalize UML and they have already shown – at least partially – what can be done in terms of analysis and simulation. Nevertheless “conventional” Petri nets, like P/T nets and color nets, are not always enough to efficiently formalize the behavior associated with UML models when specifications heavily rely on typical object-oriented features, like inheritan...

متن کامل

Learning a Skill-Teaching Curriculum with Dynamic Bayes Nets

We propose an intelligent tutoring system that constructs a curriculum of hints and problems in order to teach a student skills with a rich dependency structure. We provide a template for building a multi-layered Dynamic Bayes Net to model this problem and describe how to learn the parameters of the model from data. Planning with the DBN then produces a teaching policy for the given domain. We ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1412.6550  شماره 

صفحات  -

تاریخ انتشار 2014